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ABSTRACT 

Motivated by recent results on lognormal statistics showing that the moment hierarchy of a log- 
normal variable completely fails at capturing its information content in the large variance regime, we 
discuss in this work the inadequacy of the hierarchy of correlation functions to describe a correlated 
lognormal held, which provides a roughly accurate description of the non-linear cosmological matter 
density field. We present families of fields having the same hierarchy of correlation functions than 
the lognormal field at all orders. This explicitly demonstrates the little studied though known fact 
that the correlation function hierarchy never provides a complete description of a lognormal field, and 
that it fails to capture information in the non-linear regime, where other simple observables are left 
totally unconstrained. We discuss why perturbative, Edgeworth-like approaches to statistics in the 
non- linear regime, common in cosmology, can never reproduce or predict that effect, and why it is 
however generic for tailed fields, hinting at a breakdown of the perturbation theory based on the field 
fluctuations. We make a rough but successful quantitative connection to N-body simulations results, 
that showed that the spectrum of the log-density field carries more information than the spectrum of 
the field entering the non-linear regime. 



Subject headings: cosmology: theory — cosmology: observations 
verse — methods: statistical 
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1. INTRODUCTION 

The non-linear regime of structure formation in the 
Universe is the heart of highly challenging problems 
for statistical inference. This regime is also potentially 
very rewarding, due to the large number of modes 
present. As seen from the point of view of statistics, 
the overall picture of the linear regime is very simple 
in principle. Since fluctuations are believed to obey 
Gaussian statistics at early times, an optimal description 
is furnished by the two point correlation, equivalcntly 
by the (power) spectrum, the second member of the 
hiera r chy of the n - point correlation functions ( White! 
[19791 IPeeblesI 11985 iFrvl Il985t IBernardeau et all 12002ft . 
The question of optimality is however very far from 
clear leaving the linear regime, and clearly out of reach 
yet, due to our inability to model and handle accurately 
very high dimensional (field) statistics beyond the 
Gaussian. Beside the difficulties inherent in an accurate 
modeling of the observables on these scales, that can 
be approached with perturbation theory or iV-body 
simulations, statistical inference also faces other types of 
problems. For instance, it was shown that surprisingly 
little informati on is to be extracted from the spectrum 
on these scales (|Rimes fe Hamilto n 2005; Neyrinck et al.l 
[200l iLee fe Pen! 120081) . due to the appearance of heavy 
correlations between the modes. A recent approach 
using local transforms of the field prior the extraction of 
the spectrum, also applied to its weighted projection the 
weak lensing convergence field, wa s shown be successful 
at recapturing much information (jNevrinck et al.l [20091 : 



Seo et al.|[2^TTal INevrinck et alj|2011t 



Seo et al.ll20TTbl: 

Yu et all f20TiriJoachimi et al.l l20Tl[ INevrinckl |20T¥ . 

This holds at least in the absence of discreteness or 
shape noise issues. While it is yet not totally clear 
to what extent such improvements can propagate to 
improvements from galaxy survey or other sort of data, 
it opens a new perspective on the statistics and the 
description of non-linear fields. The success of these 
transforms, and the diagonal shape of the covariance of 
the spectrum up to much smaller scales that it creates, 
suggests that a lognormal picture is not inaccurate. That 
is, ln(l + 8) may be not too far away from a Gaussian 
field on these scales. Some other tentative arguments 
for, and confirmations to some extent of this picture 
in lower dimensionality have bee n known for a long 
time, and in a variety o f cont e xts dColes fc Joneslll991l: 
Bernardeau fc Kofmanl Il995t IMatsubara fc Yokovamal 



1996: lTavlor fcW atts 200Q pHirbert' et al J120UJ, e.g.). 
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Beside the fact that higher order correlations may 
carry information, another, in cosmology largely ignored 
process of statistical relevance is at work for tailed 
fields. The correlation function hierarchy need not 
provide a complete description of a field anymore in 
this regime, so that higher order statistics may fail to 
capture additional pieces of information, as first pointed 
out in (jColes fc Jones! H99l . This possibly means that 
these results on the log transform of the matter field not 
only bring back information from higher order statistics, 
but also information that was lost to the hierarchy. 
The one dimensional lognormal distribution is a known 
instance where the moment hierarchy does not specify 
fully its statistics. Explicit examples of other one 
dimensi onal distribu tions with the same moments are 
known (|Hevdd [l96l . For this reason, the correlation 
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function hierarchy cannot specify fully the statistics of a 
lognormal field. The first quantitative evaluation of this 
effect, exact in one dimension, has shown that this has 
a huge impact on the efficiency with which cosmological 
parameters can be extracte d from the mo ment hierarchy 
in the n on-linear regime (ICarro nl 120111 ). As pointed 
out by (|Coles fe Jones! 11991!) . this is a generic effect 
for tailed fields. Both the matter field a well as, and 
even more so the converge nce field (|Das fc Ostrikerl 
120061 iTakahashi et all 12011! ) show large tails in the 
non-linear regime. Using fits to simulations, this effect 
was indeed shown to affect parameter inference in the 
one dimension al distribution of the convergence field 
(|Carronl 120121 ). Very little is however known in higher 
dimensional settings. It is therefore important to gain 
more insights on these issues, since they strongly suggest 
a fundamental limitation of the correlation function 
hierarchy in the non-linear regime. 

The main purpose of this work is to make the ex- 
istence of this effect within any correlated lognormal 
field and its correlation function hierarchy obvious. To 
this aim, nothing can be better than an explicit example. 
In section |3l we will therefore present families of fields 
all having the same hierarchy of correlation functions 
at all orders as the lognormal field, for any mean and 
two point correlation function. We show in this light 
why this effect is irrelevant in the linear regime, but 
not in the non-linear regime. Before turning to these 
aspects, we discuss in section [2] why more standard 
approaches in cosmology, of perturbative nature, while 
of course perfectly sound in the weakly non-linear 
regime, can never predict or reproduce this effect. These 
are presumably reasons for which this effect has been 
so little studied in cosmology so far, and are worth a 
few comments. In section |4l we then make a successful 
connection to these recent simulation results, and we 
conclude in section [3J The appendix collects proofs of 
key statements in section [3] 



1.1. Notation and definitions 

We will be dealing with random vectors p = 
(pi, • ■ ■ , Pd), being the sample a field 

Pi = p(xi) > 0. (1) 

For a vector n = (n\, ■ ■ ■ ,rid) of non negative integers 
(multiindcx), wc write as p n the monomial in d variables, 

p»=p( Xl r--- P (x d r*. (2) 

Throughout this work, we reserve bold letters for vec- 
tors of integers exclusively. Let p P (p) be a <i-dimensional 
probability density function such that all correlations of 
the form (p n ) exist. We write the moment (p a ) with m n . 
Explicitly 

m a = {p ni {x 1 )~-p n *(x d )). (3) 

Correlations of order n are given by moments such that 
the order |n| of the multiindex, defined as 

d 

|n|:=X> (4) 

i=i 



is equal to n. We call these quantities moments or cor- 
relations of order n. These moments coincide with the 
values of a continuous n-point correlation function on the 
grid sampled by (x\, ■ ■ ■ ,x d ). We write S for the dimcn- 
sionlcss fluctuation field, and A for the field defined by 
In p. 

A:=]np, 5:=?—?-. (5) 
P 

Such assignments involving ratios or logarithms of d- 
dimensional quantities should be understood component 
per component. 

2. THE PROBLEM WITH TAILED FIELDS 

In one dimension, the fact that the hierarchy does not 
always specify fully the distribution is a well known and 
still active to pic of research in the theory of moments in 
mathematic s (|Shohat fc Tamarkin|[l9 63; Akhiczcr] 119651 : 
lSimonllT997L for classical references) . The moment prob- 
lem is to find a distribution corresponding to a given mo- 
ment series. When a unique solution exists, it is called 
a determinate moment problem. When several exist (in 
this case always infinitely many), it is cal led an indeter- 
minat e moment probl em. We can refer to iColes fc Jones! 
(fl9Mh : [Car7on1 (poll for a discussion in a cosmological 
context and more references. The theory of the moment 
problem in several dimensions is less developed, but typ- 
ical criteria that guarantee determinacy, or indetermi- 
nacy, linked to the decay rate of the distribution, stay 
basically unchanged. Guiding us throughout the discus- 
sion in this section will be the following instance: for any 
dimension d, if 

(e^)<oo, \P\ = (P 2 1 + ---+P 2 d ) 1/2 (6) 

for some c > 0, then the moment problem correspond- 
ing to the moment s of that distribution is determinate 
(jDunkl fc Xull200U theorem 3.1.17). By a 'tailed' distri- 
bution, we have in mind in this work a decay at infinity 
which is less than exponential, and thus for which this 
criterion fails. In this regime, there may thus be several 
distributions with the same hierarchy of correlations. 

2.1. On its relevance for parameter inference 

It should be clear why this can have in general a 
dramatic impact for parameter inference from correla- 
tions. Imagine a series of distributions with identical 
correlations at all orders, one of these distributions 
being the one that actually describes the observations. 
Since the distributions are different, they will make in 
general different predictions for obscrvables other than 
the correlations. Pick for definiteness an observable 
(f(p)) ( a ) with different predictions among this family 
of distribution, a any model parameter. The knowledge 
of the entire hierarchy is unable to distinguish from 
these different predictions for (/), since they result from 
equally valid distributions. If a enters the true distri- 
bution in such a way that it makes a sharp prediction 
on the value of (/), this is highly valuable information 
definitely lost to an analyst extracting correlations 
exclusively. On the other hand this argument allows 
us also to sec that this effect can become relevant 
only when perturbation theory breaks down. If the 
fluctuation field 5 is small, / can be expanded in powers 
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of 8, and thus (/) can be obtained in an unique way 
from the correlation hierarchy of S. 

There is a remarkable way to understand what is 
happening there in terms of Fisher information, familiar 
to cosmologists. Recall that the Fisher information 
matrix F a p associated to a probability density function 
p(p\at, /?,•••) is defined as 



dhip (9 hip 

da dj3 



(7) 



Among the many properties that makes it a meaningful 
measure of information are its positivity, additivity for 
independent variables, its invariance under invertiblc 
transformations, the Cramer Rao bound and the infor- 
mation inequality, stating that any set of obscrvablcs 
carries at mo st the same amount of F i sher information as 
p itself. See (lFisherlll925b lRaolH^73l Ivan den Bosll2007fl 
for ref ere nces to statisti c al works, and ( Jungman et al.l 
fl996allbl ; iTeemarld Il997t iTeemark et al l 11997ft for the 
first implementatio ns in cosmology, fo r Gaussian vari- 
ables. We refer to (C arron et al.ll20lTI ) for an extensive 
discussion of the information inequality in a cosmological 
context, and its deep connection with the concept of 
entropy It is a fact that the Fisher information content 
on a of the distribution is entirely within the first n 
correlations if the function d a lnp is a polynomial of 
order n. In particular, the distributions for which the 
Fisher information matrix is within the entire hierarchy 
are precisely those for which the functions d a lnp can be 
written as a power series over the range of p. If not, the 
mean squared residual to the best series expansion is the 
amount of Fis her information absent from the hierarchy 
(|Carronl 1201 lL for a proof). It is simple to show that 
criterion (J6j> , that guarantees that the distribution is 
uniquely set by its correlations, implies as well that 
the entire amount of Fisher information is within the 
hierarchy : this fol lows from the very next theorem of 
the same reference (|Dunkl fc Xull200U theorem 3.1.18), 
that states that the polynomials in the d variables form 
a dense set of functions with respect to the least mean 
squared residual criterion, if ([5]) is met. In particular the 
functions d a lnp can be arbitrarily well approximated 
by polynomials with respect to that criterion, and there- 
fore the correlations contain all of the Fisher information. 

It is important to note that if criterion (|6|) hap- 
pens to be met due to a cutoff at a large value p C ut, on 
a otherwise tailed distribution, the correlations still arc 
poor probes for any practical purposes. For instance, if 
a variable is lognormal over a very long range, but decay 
quickly at infinity starting from p cu t. Indeed, if p cut is 
large enough, the correlations of order up to, say, 2N, 
will be identical to that of the lognormal. Since the in- 
formation content of the first N correlations depends on 
the first 2N only, they will be equally poor probes as for 
the lognormal. They will contain the exact same amount 
of Fisher information as the ones of the lognormal. It is 
the correlations of order > N, that are able to feel the 
cutoff, that will make up for the difference between the 
total information content of the lognormal distribution 
and its correlation hierarchy (if the cutoff is at a large 
enough value, from ([7|) the two distributions have the 



same total amount of information). The hierarchy is thus 
still not well suited for the analysis of data in this regime. 

For the same reason, even though any lognormal 
field is indeterminate, this effect plays no role for 
parameter inference in the linear regime, when the 
actual range of the variables is still small, and the 
tail at infinity is not yet felt. This is because in this 
regime on one hand the lognormal is still very close to 
a Gaussian over the range where it takes substantial 
values, and thus the lowest order correlations will still 
contain most of the Fisher information, and on the other 
hand a few higher order terms are able to reproduce 
deviations of the functions d a \np from the Gaussian 
very accurately over this small range. This is consistent 
with the findings in section [3] showing that the families 
presented there arc indistinguishable form the lognormal 
for any practical purposes in the linear regime. 

2.2. On other approaches to non Gaussian statistics 

Let us comment in light of the criterion ([6]) on typi- 
cal perturbativc approaches in cosmology to parametrize 
(weakly) non Gaussian distributions. These involves mo- 
ments, such as Gram-Charlier, Edgeworth expansions, or 
the relation betwe en the moment generating function and 
the d i stribution dFrvl Il985t iBernardeaul 119941: iColombi 
1994 : Uuszkiewicz et al. 19951: | Bernardeau fc Kofman 



199a iBlinnikov fc Moessnerl 119981 e.g.). m one or sev- 
eral dimensions. It is therefore interesting to see to what 
extent they fit into this picture. Typically, when ap- 
plied to the S field, to first order these parametrize the 
non-Gaussianity through a polynomial with coefficients 
involving the cumulants, or equivalently the moments of 
the variable. Schematically, 

p v {v) « e-» 2 ' 2 (1 + a 3 H 3 (u) + ctiH A {v) + ■••), (8) 

with v = 6/ as ■ The coefficient on depends on the first 
i moments. The correction is given in terms of Hermite 
polynomials iJ„, which are the orthogonal polynomials 
associated to the Gaussian distribution. Such expansions 
never produce a tailed distribution, in the sense that ([6]) 
is always met. The decay of the distribution namely still 
is Gaussian. Now, to first order and over the range of p, 
equation (JHJ is equivalent to 

\np u {v) w const - u 2 /2 + a 3 H 3 (v) + a 4 H 4 (u) H (9) 

Therefore, the functions d a \ap will have close to polyno- 
mial form. This is perfectly consistent with that decom- 
position of the Fisher information. Indeed, this expan- 
sion creates a probability density for which its Fisher in- 
formation content is within the moments that were used 
to build it. This is another way to see that moment - 
indctcrminatc distributions cannot be produced by per- 
turbativc expansions. 

3. FIELDS WITH THE SAME HIERARCHY OF 
CORRELATION FUNCTIONS. 

After reviewing the basic properties of correlated log- 
normal variables, we present both continuous as well as 
discrete families that have the same correlations as the 
lognormal at all orders, for any dimensionality d. In fact, 
it turns out that a stronger statement is true : for these 
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families, all observables of the form 

(p( Xl ) n i ■ ■ ■ p(x d ) n «) , m = ...-1,0,1 



(10) 



are identical to those of the lognormal field, i.e. any 
can also be negative as well. Including the hierarchy of 
inverse powers and 'mixed' powers to the usual hierarchy 
thus still does not provide a complete description. 

These families arc generalizations to any number 
of dimension, means and two-point correlations of 
known one dimensional examples that can be found in 
the statistical literature (|Hevdelll963t lStoianovHl987t ). 

Requirements such as homogeneity and isotropy 
are actually not needed for this section. In particular, 
unless otherwise specified, A is a d-dimensional mean 
vector (A(x{), - ■ ■ , A(xd)), whose components can 
differ in principle. Nevertheless, the picture we have 
in mind is that of statistically homogeneous isotropic 
fields in a box of volume V , where some set of Fourier 
modes /c m in to fc max can be probed. The corresponding 
Fourier representation of the two point correlations, in 
a continuous notation, is 

(11) 

where the integral runs over these modes, and ^^(r) 
is the ordinary two-point correlation function of 6 or A. 
The matrix inverse is given by 



d 3 k 



1 



(2tt)3 P A . 5 (k) 



a ik-(xi—Xj) 



(12) 



This representation allow us to define a bit more rigor- 
ously what we mean by linear and non-linear lognormal 
field, or linear and non-linear regime, in the following dis- 
cussion : if needed, it can be formally set as P A (k) — > 
or P A (k) — > °o respectively, for all k. 

3.1. Basic properties of lognormal fields 

We say the field p := (p(x 1 ), ■ ■ ■ ,p(xd)) is lognormal 
if the d-dimensional probability density function for A is 
Gaussian, 



Pa(A) 



1 



(2n\U\) 



,1/2 



cx P (--(A-A)-Q 1 (A-A) 



(13) 

where A is the mean vector of A, and £4 its covariance 
matrix, 

[U] tj = ((A(x t ) - A(x t )) (A( Xj ) - A( Xj ))) . (14) 

The probability density for the vector p itself is then a 
d-dimensional lognormal distribution, that we define for 
further reference as p p : 



LN ( ) . = PA(lnp) 



(15) 



The means and two point correlations of A and 6 are in 
one to one correspondence. We have 



A = \np-^a 2 A 



(16) 



where a\ is the diagonal of £4, i-e. the variances of the 
individual d points. Also, 

[U]a = In (l + > Mij ■= (S(xi)6( Xi )) . (17) 

Especially, the variances are related through 

a\ = In (1 + af) . (18) 

3.2. Continuous family 

Define the statistics of p — p(xi), ■ ■ ■ ,p(xd)) through 
the following. Pick a real number e with |e| < 1. Pick 
further a set of angular frequencies uj = (ui±,--- ,u>d). 
Each of these must be an integer. Fix Pp N (p) the d- 
dimcnsional lognormal distribution with mean A and co- 
variance matrix £4 defined above. Then set 

P P (P) ■■= P L P N (P) [1 + "in (ttu; • {A - A))} (19) 

Since |e| < 1 this is positive and seen to be a well 
defined probability density function^- The claim that 
p P (p) defined in this way has the same moments m n 
as the lognormal for any multiindcx n is proved in the 
appendix. Note that in the above definition, A is the 
quantity that enters the definition of lognormal variables 
in equation (|13|) . It is however not the mean of A = In p 
anymore, when p is defined through (I19[) . 

The functional form of p p (p) consists of the log- 
normal envelope modulated by sinusoidal oscillations 
in A. The smaller the two-point function the higher 
frequency the oscillations. This may sound curious at 
first, since it seems to imply that the more linear the 
field, the more different the distributions within this 
family will thus appear. However, this is precisely when 
the oscillations are the strongest that this effect is less 
relevant. This can be seen as the following. Taking 
the average of any function / with respect to p p leads 
trivially to 



(f) = (f) LN + e(fsm (w&iA-A))) 



LN 



(20) 



where the subscript ln denotes the average with respect 
to the lognormal distribution. In the limit of the very 
linear regime, other terms fixed, the second term will 
average out to zero for any reasonable /, since it is 
the integral of an highly oscillating function weighted 
by a smooth integrand. In the non-linear regime this 
in general ceases to be the case. This is illustrated as 
the solid lines in figures [T] (erg = 1) and [2] (as = 0.1), 
showing the member of that family in one dimension 
with minimal frequency u) = 1, and e = 0.1. The 
dotted lines on these figures are the usual Gaussian for 
A-A = z. 

The probability density function for lnp is not purely 
Gaussian anymore. It is therefore of interest to see how 
the correlations of A deviate from those of Gaussian 
variables. For instance the means (A — A) do not 
vanish anymore as for the lognormal. A straightforward 

3 For d = 1 , there are very slight differences with Heyde original 
family. Heyde unnecessarily writes 2tt instead of it, and restricts e 
and oj to be positive. 
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FlG L l. — Three different one dimensional distributions for z = 
A — A, with identical moments (p n ) , p = e A , for all integer n, 
positive or negative. The dashed line is the zero mean Gaussian 
distributio n, s o that p is lognormal. The solid the member of the 
family in II 191 1 with the lowes t po ssible frequency, and amplitude 
e = 0.1. The discrete one is l|25jl with shift parameter a = 0.25. 
They are shown at the scale of non linearity ag = 1, where this 
indeterminacy starts to become very relevant for inference. The 
families in any dimension are qualitatively identical to these. 
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Fig. 2. — Same as figure [T] for erg = 0.1, when the indeterminacy 
if far less relevant for inference, for the reasons given in the text. 
The discrete distribution has been scaled by a constant factor for 
convenience. 

calculation leads to 

((A-A)( Xi )) ^-eTr^cxp^-yu;-^ 1 ^ . (21) 

Picking u> as having a single non zero entry, lo, at we 
get that they can be as large as 



((A-A)(xi)) = - £ ™exp(- — w 2 [Z A 1 ] i 



-e ttuj exp 



(22) 



2 a 2 A J Xl ) 



Observables as simple as the means of A are therefore 
not constrained by the knowledge of the entire correla- 
tion hierarchy of the lognormal field. While the effect 
is irrelevant in the linear regime (for say (JAeS = 0.1, 
the maximal value of the mean in equation (|22|) is only 
w 10~ 215 ), deep in the non-linear regime this is not the 
case anymore. It is easy to show from the above expres- 



sion that the range available to ^(A — A) (xt)), choosing 
w appropriately, scales to infinity with oc <JA,eS- The 
means arc thus left totally unconstrained in that regime. 
This and the very sharp behavior is of course a generic 
effect, not limited to that particular observable. It is 
obvious that the relevance of this effect for parameter in- 
ference is very sensitive to the degree of linearity of the 
field, and that large amounts of information are lost to 
the hierarchy in the high variance regimfl 

3.3. Discrete family 

Fix again the dimensionality d, the vector A and the 
matrix £a- For all integer valued d-dimensional multiin- 
dex n define a realization A n of A as the following. Pick 
a = (ai, • • • , ad) any point, and set 

A n := A + U-(n-a). (23) 

While a can in principle be anything, only components 
ai G [0, 1) will actually define different grids. As usual, 
p is given by exponentiation, 



exp (A n ) 



(24) 



Assign then to these realizations parametrized by n a 
probability 



Pn = \ exp (~ {A n - A) • (A n - A)\ 



(25) 



These are usual Gaussian probabilities for A n , except 
that we have only a discrete set of field realizations. Note 
that it can be written, maybe more conveniently, as 



P n = \ CX P 



1 



(n - a) ■ U (n ■ 



(26) 



Since £a is positive definite, the normalization factor Z 
is seen to be well defined, as for more usual Gaussian 
integrals, and so are the probabilities. This discrete 
probability distribution has the same moments of p n 
than the d-dimensional lognormal distribution with 
associated A and £a, as proven in the appendix. Again, 
negative entries in n are allowed. 

This family is clearly different from the previous, 
continuous one. Rather than modulating the lognormal 
distribution with an oscillating factor, it is a series of 
Dirac delta functions sampling the lognormal on the 
grid given by (|23|) . The role of a is to shift the sample 
by a small amount. If a is set to zero, then A = A 
is part of the sample, while it is not if not. The fact 
that this indeterminacy is irrelevant in the linear regime 
comes this time from realizing that for any nice enough 
function /, the average of / will converge to (f) LN due 
to the trapezoidal rule of quadrature. The grid spacing 
at which A is sampled in this way in ([24"|) becomes 
namely thinner and thinner. In the non-linear regime, 
the spacing is however very large, leading again to large 
deviations. This is also illustrated in figure [T] and [2] for 
the one dimensional version of it, with shift parameter 
a = 0.25. 

4 Among this family, it turns out that some observables such as 
the variances ((A — A)^(xi)} are always identical to <r^(xi) for any 
choice of e and uj. We do not attach any significance to this, since 
this is not the case for the discrete family, though closed analytical 
expressions cannot be obtained in this case. 
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4. CONNECTION TO SIMULATIONS AND DISCUSSION 

One of us (|Ncvrinck 2 0111 ) ana lysed the Coyote Uni- 
verse N-body simula tions suite (jHeitmann et al.l 120101 : 
lLawrence et al.l [20101 ) in a box of volume V = 2.2Gpc 3 , 
with 256 3 cells, extracting the spectrum P(k) of A and 
8 over then range 0.02/Mpc < k < 0.6/Mpc, comparing 
their statistical power as function of the smallest scale 
fcmax included in the analysis for several cosmological 
parameters. It was found that the spectrum of A has 
more constraining power on cosmological parameters 
than that of 8, when the non linear scales are included 
in the analysis. We refer to that paper for more details 
on the procedures and results. In this framework, p is 
1 + 8, and thus A = ln(l + 8). The fields are statistically 
homogeneous and isotropic. 

Given the considerations of the previous sections, 
and the fact that the density field is known to be some- 
what close to lognormal, these results can hardly be 
considered surprising. The field A must be indeed closer 
to a Gaussian field for all values of the cosmological 
parameters, so that low order N point functions of A 
must contain a larger fraction of the information than 
those of 8 (it is useful to remember that the full fields 
A and <5 carry in all cases the very same total amount 
of information, since the mapping between them is 
parameter independent and invertible). In this section 
we want to go a step further from these qualitative 
considerations and make a quantitative comparison 
of these results to simple analytical methods using 
lognormal statistics. 

4.1. Treating information in A as Gaussian. 

First, we need to make sure that a Gaussian de- 
scription of the field A is reasonable, at least for what 
concerns the information content. In particular, this 
is not the case for the smallest scales of A, since 
the covariance matrix of Pa in the 256 3 box clearly 
shows substantial off diagonal elements starting from 
k ~ 0.3/Mpc. We therefore repeated the same analysis, 
performing the logarithmic transform on the 8 field only 
after smoothing 8 on twice the original length scale, by 
merging the 256 3 into 128 3 cells. This allowed us to 
extract the spectra of A and 8 over the range 0.02/Mpc 
< k < 0.3/Mpc, with a diagonal covariance matrix 
over the full range to a very good approximation. It 
is important to realize that sadly it is not identical to 
the much simpler approach of considering the original A 
field only up to the new fc max : since all the scales of 8 
have an impact on the large scales of A, the operations 
of smoothing 8 and then log transforming 8 are not 
identical to log transforming 8 and then smoothing A. 

For a purely Gaussian field with spectrum P, the 
information content on a in the spectrum is given by 



F = Y. 

- 2 



d 3 k f din P(k) 



(2tt) 3 V da 
where the sum runs over the modes extracted, and 
1 



a 



A(a) 



(27) 



(28) 




™ -2.0 - 



Fig. 3. — Comparison of various estimates of the error bar on the 
linear power spectrum amplitude, ln<r|, constrained using power 
spectra of the overdensity 8 (black) and the log-density A (red) 
in an TV-body simulation. Solid curves show how the error bars 
tighten as the max imum k analyze d increases up to the Nyquist 
frequency, as in e.g. INev rinck 1 20130, equation 1291 . Dotted curves 
neglect the non-Gaussian component of the covariance matrices, 
as well a s th e discrete nature of the Fourier-space mode lattice, 
equation J2jj. The arrows (one for each choice of a a, 0.7 and 0.9) 
show the expected degradation of the error ba rs f rom analyzing 
8 instead of A in our model given by equation J39I I; these factors 
appear numerically in the first column of Table [J 

can be thought of as approximating the constraints on a 
achievable with these modes. We focus for reasons that 
become clear below primarily on the parameter In erg, 
which has a roughly constant impact both on lnP,5 and 
In Pa. In figure GU wc compare this for the <5 field and 
the A field as function of fc max . The solid lines are the 
simulation results, evaluating the covariance matrix Ckk' 
between the modes k and k' and setting 



A(lncr|) 



E 

fc,fc'<fe m 



dlnP(fc) x d\nP{k') 
diner 2 kk ' dlnal 



(29) 



while the dashed lines are in both cases equation (|28|) 
given by ([2"T]) . with the derivatives being those extracted 
from the simulations. Since the derivatives are roughly 
constant, the dashed lines scale like fc -3 / 2 , i.e. the inverse 
root of the number of modes. It is clear that the log 
transform extends the (rough) validity of the Gaussian 
approximation in terms of Fisher information to the full 
range of scales we are dealing with. Note however that 
this is a statement only up to the four point level, since 
those are the only ones that enter (|2"T|) and (|29[) . 

4.2. Comparison to simulations 

To compare these results to analytical predictions from 
lognormal statistics, we first note the following. For a 
parameter, such as In erf, that obeys roughly 



d\nP A {k) 
da 



const 



(30) 



the correlated Gaussian field A is equivalent, from the 
point of the view of the information on that parameter, 
to a field with the same variance but with £(r) = for 
r > 0. This may not sound like an obvious statement 
so let us show this explicitly : start from equation (|27j) 
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which leads to 



V 



d 3 k 



(31) 



The integration on the right, in a discrete description, 
is the number of available modes, equal to the number 
d of grid points, times the spacing of the modes Ak = 
(2tt) 3 /V. It follows 



p 

± aa 



9 d 
C 2- 



(32) 



On the other hand, the observation of d uncorrelated 
Gaussian variables with variance a\ always carries the 
information 

.2 x 2 



( 

\ da 



1 

24 



(33) 



in their variances. If the derivative of In P is the constant 
c, we have 



da\ 
da 



d 3 k dP A (k) 
(2tt) 3 da 

d 3 k gjnfitW (34) 



(2tt) s 



On 



ca 2 A . 



and thus expressions ([32]) and (|33[) are identical. In 
terms of information on such parameters, the correlated, 
Gaussian A field is thus exactly equivalent to d un- 
correlated Gaussian variables with the same variances. 
These parameters can be seen as entering therefore 
predominantly the variance, the two point correlation 
function at zero lag, that contains most information, 
and the correlations at non zero lag carrying little 
independent information. This is also expected to 
hold for the 8 field, since it is very non-linear and the 
variance dominates over the clustering in the two point 
correlation matrix, i.e. the two point correlation matrix 
is close to diagonal, so that the variance will dominate 
in any covariance matrix, as well as in the sensitivity to 
the parameter. 

Since information just adds up for any number in- 
dependent variables, this means that we c an try and use 
directly the exact results one of us derived (|Carronll2011[ ) 
for the one dimensional lognormal distribution to get a 
rough but still reasonable estimate of the improvement 
in the constraints from analyzing the A field. In that 
work were derived the cumulative efficiencies 



~N 



1 

± aa 



N 



£«) 2 e(o,i) 



(35) 



n=2 



of the first N moments of the 8 field to catch the 
information in A (equations 31-35 as well as figure 2 and 
figure 1, solid line, in that paper). These coefficients 
are extremely sensitive functions of a\, decaying like 
exp(— 4<7^) ~ <Tg S as soon as a a becomes close to unity. 

There is a slight modification to make to these 
coefficients so that we can confront them to the sim- 
ulations. From the simulations only the spectrum of 



A were extracted, but not the mean of A, which also 
carries information in principle, even if 8 itself has zero 
mean. For a one dimensional lognormal variable with 
unit mean, we have from equation (|16|) that A = — \<j\. 
For that lognormal variable the total information is 
given by the usual formula for the Gaussian A, 



F =±(™ 

M aa 9 I ^ 

cri, \ aa 



It reduces thus to 



p 

± aa 



2a\ 



d_a\ 
da 



J_(drt 
2a\ \ da 



2 

1 + 24 



(36) 



(37) 



where the rightmost term contains the part of the infor- 
mation in the mean of A. The efficiencies ratios of the 
moments of 8 to that of the variance of A only, excluding 
the mean, becomes thus 



■ N 



1 + ^ 



(38) 



Note that in principle theses efficiencies can now be 
larger than unity, if the moments of 5 would capture not 
only the information in a A , but also that in A. 

The improvement factors, i.e. the ratio of the con- 
straints on a from analyzing the first N correlation 
functions of 8, to the the constraint from the two-point 
function of A, arc thus in this model 



: -/vJ 



-1/2 



A%(a)/A^(a). 



(39) 



They are independent of the parameter a in this one 
dimensional picture, since the only relevant parameter is 
a\, or equivalcntly er 2 . Remember that the denominator 
on the right hand side can actually be calculated for any 
lognormal field from (|27|) . our additional assumptions 
can be seen thus as entering only the numerator. We 
argued that this ratio is expected to be correct for 
parameters such as In erf, but they become in all cases 
exact for a lognormal field whose variance dominates 
enough the clustering, £s(r)/<Tg <C 1, for all r. The 
effective nearest neighbor distance given the modes we 

,3. -1/3 

used can be evaluated as r min w J , 
find ts{r m in)/<J 2 s = 0.3. 



and we 



Finally, there is slight ambiguity in evaluating e a n . A 
purely lognormal field has a a = [hi(l + cr^)] 1 / 2 , but this 
relation is not fulfilled precisely in our simulations. We 
obtain a A = 0.7, as = 1.1 and so , [ln(l + cr 2 )] 1/2 = 0.9 
rather than 0.7. This discrepancy may be due of course 
to an intrinsic failure of the lognormal assumption, or to 
the presence of the smallest scales, slightly correlated, 
as seen from the start of saturation in figure [3J 

We show in the first two rows of Table [T] the fac- 
tors of improvement for these two values of a a, 0.7 and 
0.9, for N = 2,3 and oo. In the third row is shown 
the improvement found extracting Pa rather than P$ 
in the simulations. Given our assumptions, and the 
very high sensitivity of e a N to the variance of the field, 
they agree remarkably : for the sake of comparison, a 
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TABLE 1 

Factors of improvement in constraints on 
parameters 









Al/A^ 4 


A^/A^ 


LN, a a 


= 0.7 


2.0 


1.6 


1.3 


LN, a a 


= 0.9 


2.9 


2.4 


2.1 


Sim. a 


= ln<r| 


2.5 






Sim. a 


= H s 


2.1 







variance twice as large of a a = 2 — > cr^ = 7.3 would 
have predicted a factor of A2(<5)/A2(A) = 522, and for 
a A = 3 -> (7,5 = 90 a factor of « 5 • 10 6 . 

We also performed this analysis for the tilt param- 
eter n s , which from its very definition has a very 
differentiated impact over diffe rent modes, an d finding, 
just as in the original analysis ([Ncyrinck 201 If) . that the 
improvement factor is roughly parameter independent 
as shown in the fourth row of the table. This is another 
argument supporting the view that the dynamics of the 
information are indeed captured by such a simple pic- 
ture. It may be due to the fact that the smallest scales, 
containing the largest number of modes, contributes the 
majority of the information in A for any parameter, 
and thus that the sensitivity can be effectively treated 
as constant, equal to its value on small scales, making 
our argument above valid for basically any parameter. 
Note that for both values of a a the spectrum of A still 
outperforms the entire hierarchy of S by a sizeable factor 
for the lognormal model. Of course, this is much more 
speculative. 

5. CONCLUSION AND DISCUSSION 

We have made clear that the correlation functions 
are generically very poor descriptors and probes of 
fields with large tails. This is especially true for the 
lognormal field, a standard prescription for the statistics 
of cosmological non-linear fields, and we provided other 
explicit fields with exactly the same hierarchy at all 
orders. We showed that the knowledge of the entire 
hierarchy of N point functions of a non-linear lognormal 
field is insufficient to constrain other, simple, observ- 
ables. We discussed the links between these aspects 
and the failure of power series expansions to reproduce 
relevant functions. We argued that this inadequacy is 
responsible for the recent successes of the log transforms 
in cosmology at recapturing information, and that they 
may not only bring back information from higher order 
statistics, but likely also information that cannot be 
probed at all with the hierarchy. We then showed 
that the factors of improvements on constraints from 
analyzing the spectrum of A to that of S as seen in 
A^-body simulations are in quantitative agreement with 
simple analytical predictions using lognormal statistics. 

Observational noise issues were not considered in 
this work. It remains therefore unclear to what extent 
these improvements can be achieved with actual galaxy 
survey data. Generically, it is reasonable to expect that 
noise will reduce these improvement factors. This work 
nonetheless makes clear that in this case, improving 



the specifications of a survey in order to decrease the 
observational (e.g. shot) noise will be at the same time 
actually reducing the efficiency with which cosmological 
parameters can be extracted with the hierarchy of S 
(i.e. the fraction of information that is contained in the 
hierarchy with respect to the total). 

Surely, the question of the incompleteness of the 
hierarchy of the matter or any other field is in itself to a 
certain extent academical, since high order correlations 
will probably anyway stay out of reach for a long 
time. Nevertheless, it provides directions and insights 
into the recent successes of these transforms, strongly 
suggesting that in the non-linear regime, an approach 
using transforms is much more promising than targeting 
higher order statistics for inference on any parameter. 
We are also convinced that the statistical methods and 
formalism introduced will be more widely applicable in 
the future. Progress on these issues will be reported in 
due time. 

We are thankful to the anonymous referee. JC warmly 
thanks Alex Szalay, Xin Wang and the hospitality of the 
Physics and Astronomy Department of Johns Hopkins 
University, where this work was conducted. He also ac- 
knowledges the support of the Swiss National Founda- 
tion. MCN is grateful for support from the W.M. Keck 
and the Gordon and Betty Moore Foundations. 
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APPENDIX 

We prove the claim made in this work that the distributions we defined have the same correlations than the lognormal 
at all orders. As we will see this is also true including 'negative orders' and 'mixed orders', i.e. when negative powers 
of the variables are allowed in the correlations. 

Recall that for lognormal variables p = (/?!,■■■ , pd) with means and covariancc matrix of their logarithms A = 
(Ai, ■ ■ ■ Ad) and £a we have 

m„ := (p m ) = (pi 1 ■■■Pd d ) =cxp (n-A+ ^n-U^j , n = (m, • • • , n d ). (1) 

A simple proof of this fact is to make use of the standard formulae for Gaussian integrals, valid for any positive matrix 
mean vector A and vector z, that can be complex valued. 



12) 



Essentially all calculations in this work follow from this formula. Even the proof for the discrete family can be 
considered a discrete version of that relation. 

Continuous 

To prove our claim it is enough to show that 

(p" S in(7ru;-Q 1 {A-A))) LN = 0. (3) 

This must hold for any d-dimensional multiindices u) and n (we allow entries to be negative), where the average is 
taken with respect to the lognormal density function, equation (|15p . We proceed as the following : we evaluate the 
following integral 

J(n, w) := (p n exp u; ■ & (A - A))) LN , (4) 

and show that its imaginary part vanishes for uj and n as specified. 
Writing equation (j4]) using 

p n = exp(n ■ A) = exp [n ■ (A - A) + n ■ A] (5) 



leads immediately to the Gaussian integral given in ([2]), with z = n + *tt£a u>. It follows from that equation 



J(n, uj) = exp 



n ■ A + 2 ( n + ilT ^A lu} ) ■ 6i (n + iirt/u) 



(6) 
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Separating real from imaginary argument, this expression reduces to 



1 7T 

I(n, u>) — exp ( n • A + -n • — u> ■ £^ u> ) • cxp (in uj ■ n) 



(7) 



The imaginary part of that expression is thus proportional to sin n ui ■ n. Whenever u> and n are integer valued, so is 
their scalar product u> ■ n = ^jWjrii. Therefore, the sine vanishes and ([3]) is proved. 



Discrete 

From equation ([2U)) and we have 

= exp (m ■ A + m • £4 (n — a)) 
It follows that the moments of p are given by 



exp 



nGZ d 



-- (n - a) £a (n - a) + m • £ A (n - a) 



(8) 



(9) 



The proof is based on completing the square in the exponent, in perfect analogy of standard proofs of the Gaussian 
integral in ([2]). Write 

- ^ (n - a) ■ £ A (n - a) + m • £ A (n - a) = -- (n - m - a) £ A (n - m - a) + • ^m, (10) 
and then perform the shift of summing index n — > n + m, obtaining 



m • A + -m ■ 



\ cxp \ -l( n - a ) ■ a) 



(11) 



Since the sum ranges over all the multiindices, the shift does not create boundary terms. This last sum is nothing else 
than Z, so that we recover 

(12) 



(p m ) = exp ( m ■ A + -m • ) , 



which are indeed the same as the lognormal in ([!]). Again, this is also true if negative entries in m are permitted. 



